Identifier (computer Languages)
   HOME

TheInfoList



OR:

In computer
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
s, an identifier is a
lexical token In computer science, lexical analysis, lexing or tokenization is the process of converting a sequence of characters (such as in a computer program or web page) into a sequence of ''lexical tokens'' ( strings with an assigned and thus identified ...
(also called a
symbol A symbol is a mark, sign, or word that indicates, signifies, or is understood as representing an idea, object, or relationship. Symbols allow people to go beyond what is known or seen by creating linkages between otherwise very different conc ...
, but not to be confused with the symbol primitive data type) that names the language's entities. Some of the kinds of entities an identifier might denote include variables,
data type In computer science and computer programming, a data type (or simply type) is a set of possible values and a set of allowed operations on it. A data type tells the compiler or interpreter how the programmer intends to use the data. Most progra ...
s,
labels A label (as distinct from signage) is a piece of paper, plastic film, cloth, metal, or other material affixed to a container or product, on which is written or printed information or symbols about the product or item. Information printed dir ...
,
subroutine In computer programming, a function or subroutine is a sequence of program instructions that performs a specific task, packaged as a unit. This unit can then be used in programs wherever that particular task should be performed. Functions may ...
s, and
modules Broadly speaking, modularity is the degree to which a system's components may be separated and recombined, often with the benefit of flexibility and variety in use. The concept of modularity is used primarily to reduce complexity by breaking a sy ...
.


Lexical form

Which character sequences constitute identifiers depends on the
lexical grammar In computer science, a lexical grammar is a formal grammar defining the syntax of tokens. The program is written using characters that are defined by the lexical structure of the language used. The character set is equivalent to the alphabet used b ...
of the language. A common rule is
alphanumeric Alphanumericals or alphanumeric characters are a combination of alphabetical and numerical characters. More specifically, they are the collection of Latin letters and Arabic digits. An alphanumeric code is an identifier made of alphanumeric ch ...
sequences, with underscore also allowed (in some languages, _ is not allowed), and with the condition that it can not begin with a numerical digit (to simplify
lexing In computer science, lexical analysis, lexing or tokenization is the process of converting a sequence of characters (such as in a computer program or web page) into a sequence of ''lexical tokens'' ( strings with an assigned and thus identified ...
by avoiding confusing with
integer literal In computer science, an integer literal is a kind of literal for an integer whose value is directly represented in source code. For example, in the assignment statement x = 1, the string 1 is an integer literal indicating the value 1, while in the ...
s) – so foo, foo1, foo_bar, _foo are allowed, but 1foo is not – this is the definition used in earlier versions of C and
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
,
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
, and many other languages. Later versions of these languages, along with many other modern languages, support many more
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...
characters in an identifier. However, a common restriction is not to permit whitespace characters and language operators; this simplifies tokenization by making it free-form and context-free. For example, forbidding + in identifiers due to its use as a binary operation means that a+b and a + b can be tokenized the same, while if it were allowed, a+b would be an identifier, not an addition. Whitespace in identifier is particularly problematic, as if spaces are allowed in identifiers, then a clause such as if rainy day then 1 is legal, with rainy day as an identifier, but tokenizing this requires the phrasal context of being in the condition of an if clause. Some languages do allow spaces in identifiers, however, such as
ALGOL 68 ALGOL 68 (short for ''Algorithmic Language 1968'') is an imperative programming language that was conceived as a successor to the ALGOL 60 programming language, designed with the goal of a much wider scope of application and more rigorously de ...
and some ALGOL variants – for example, the following is a valid statement: real half pi; which could be entered as .real. half pi; (keywords are represented in boldface, concretely via stropping). In ALGOL this was possible because keywords are syntactically differentiated, so there is no risk of collision or ambiguity, spaces are eliminated during the
line reconstruction In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
phase, and the source was processed via
scannerless parsing In computer science, scannerless parsing (also called lexerless parsing) performs tokenization (breaking a stream of characters into words) and parsing (arranging the words into phrases) in a single step, rather than breaking it up into a pipeli ...
, so lexing could be context-sensitive. In most languages, some character sequences have the lexical form of an identifier but are known as keywords – for example, if is frequently a keyword for an if clause, but lexically is of the same form as ig or foo namely a sequence of letters. This overlap can be handled in various ways: these may be forbidden from being identifiers – which simplifies tokenization and parsing – in which case they are
reserved words In a computer language, a reserved word (also known as a reserved identifier) is a word that cannot be used as an identifier, such as the name of a variable, function, or label – it is "reserved from use". This is a syntactic definition, and a re ...
; they may both be allowed but distinguished in other ways, such as via stropping; or keyword sequences may be allowed as identifiers and which sense is determined from context, which requires a context-sensitive lexer. Non-keywords may also be reserved words (forbidden as identifiers), particularly for
forward compatibility Forward compatibility or upward compatibility is a design characteristic that allows a system to accept input intended for a later version of itself. The concept can be applied to entire systems, electrical interfaces, telecommunication signals, d ...
, in case a word may become a keyword in future. In a few languages, e.g.,
PL/1 PL/I (Programming Language One, pronounced and sometimes written PL/1) is a procedural, imperative computer programming language developed and published by IBM. It is designed for scientific, engineering, business and system programming. I ...
, the distinction is not clear.


Semantics

The scope, or accessibility within a program of an identifier can be either local or global. A global identifier is declared outside of functions and is available throughout the program. A local identifier is declared within a specific function and only available within that function. For implementations of programming languages that are using a
compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
, identifiers are often only
compile time In computer science, compile time (or compile-time) describes the time window during which a computer program is compiled. The term is used as an adjective to describe concepts related to the context of program compilation, as opposed to concept ...
entities. That is, at runtime the compiled program contains references to memory addresses and offsets rather than the textual identifier tokens (these memory addresses, or offsets, having been assigned by the compiler to each identifier). In languages that support
reflection Reflection or reflexion may refer to: Science and technology * Reflection (physics), a common wave phenomenon ** Specular reflection, reflection from a smooth surface *** Mirror image, a reflection in a mirror or in water ** Signal reflection, in ...
, such as interactive evaluation of source code (using an interpreter or an incremental compiler), identifiers are also runtime entities, sometimes even as
first-class object In programming language design, a first-class citizen (also type, object, entity, or value) in a given programming language is an entity which supports all the operations generally available to other entities. These operations typically include ...
s that can be freely manipulated and evaluated. In
Lisp A lisp is a speech impairment in which a person misarticulates sibilants (, , , , , , , ). These misarticulations often result in unclear speech. Types * A frontal lisp occurs when the tongue is placed anterior to the target. Interdental lisping ...
, these are called
symbols A symbol is a mark, sign, or word that indicates, signifies, or is understood as representing an idea, object, or relationship. Symbols allow people to go beyond what is known or seen by creating linkages between otherwise very different conc ...
. Compilers and interpreters do not usually assign any semantic meaning to an identifier based on the actual character sequence used. However, there are exceptions. For example: * In
Perl Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offici ...
a variable is indicated using a prefix called a
sigil A sigil () is a type of symbol used in magic. The term has usually referred to a pictorial signature of a deity or spirit. In modern usage, especially in the context of chaos magic, sigil refers to a symbolic representation of the practitioner ...
, which specifies aspects of how the variable is interpreted in expressions. * In
Ruby A ruby is a pinkish red to blood-red colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sa ...
a variable is automatically considered immutable if its identifier starts with a capital letter. * In Fortran, the first letter in a variable's name indicates whether by default it is created as an
integer An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign (−1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the language ...
or
floating point In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can be ...
variable. * In Go, the capitalization of the first letter of a variable's name determines its visibility (uppercase for public, lowercase for private). In some languages such as Go, identifiers uniqueness is based on their spelling and their visibility. In
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScri ...
an identifier is one of the possible
attributes Attribute may refer to: * Attribute (philosophy), an extrinsic property of an object * Attribute (research), a characteristic of an object * Grammatical modifier, in natural languages * Attribute (computing), a specification that defines a proper ...
of an
HTML element An HTML element is a type of HTML (HyperText Markup Language) document component, one of several types of HTML nodes (there are also text nodes, comment nodes and others). The first used version of HTML was written by Tim Berners-Lee in 1993 ...
. It is unique within the document.


References

{{reflist


See also

*
Naming convention (programming) In computer programming, a naming convention is a set of rules for choosing the character sequence to be used for identifiers which denote variables, types, functions, and other entities in source code and documentation. Reasons for using a na ...
Programming language concepts Metadata Syntactic entities